AITopics | optimization and generalization

Collaborating Authors

optimization and generalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsDec-24-2025, 08:43:40 GMT

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m*<=m. We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice. Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples. These results are confirmed by numerical experiments.

optimization and generalization, quadratic activation function, shallow neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsFeb-11-2025, 23:16:14 GMT

Reviews for this paper are mitigated, in particular some reviewers were concerned about some missing proofs. On the other hand, the paper studies an important problem and carries a nice analysis that integrates numerical experiments, heuristic derivations and rigorous proofs in a meaningful way; and the reader learns a lot about such models (quadratic 2-layer networks with sparse teacher). It is thus necessary that the authors spend a lot of effort writing the missing proofs thoroughly because it will not be possible to review those proofs again (and of course all the other changes proposed in the rebuttal should be implemented). Overall, for such a paper that contains true statements, conjectures and heuristics, it is very important to emphasize on the "truth status" of each statement, and "true statements" should have a proof.

optimization and generalization, quadratic activation function, shallow neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Review for NeurIPS paper: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsJan-26-2025, 23:38:02 GMT

For random initialization, I also believe that it still needs a lot of effort. The upper bound of E(A(t)) is clearly dependent on the condition number of A(0) instead of simply dividing the cases into full-rank and rank-deficient. Moreover, rather than only focusing on the full-rank case, the author may consider the problem uniformly and continuously, e.g., the MP-law from RMT may help to provide an asymptotic analysis for the random initialization since the universal distribution for the eigenvalues are given. Also, there may exist the non-asymptotic version, but more perturbation bounds are needed. BTW, due to my research background, I neglected the development of shallow neural networks with random Gaussian input. I am sorry about that and raise my score.

optimization and generalization, quadratic activation function, shallow neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Neural Information Processing SystemsOct-10-2024, 21:58:11 GMT

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m* m. We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice.

optimization and generalization, quadratic activation function, shallow neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

From Optimization Dynamics to Generalization Bounds via {\L}ojasiewicz Gradient Inequality

Liu, Fusheng, Yang, Haizhao, Hayou, Soufiane, Li, Qianxiao

arXiv.org Artificial IntelligenceOct-12-2022

Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results.

artificial intelligence, generalization, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2202.1067

Country:

Asia > Singapore (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Interplay Between Optimization and Generalization of Stochastic Gradient Descent with Covariance Noise

Wen, Yeming, Luk, Kevin, Gazeau, Maxime, Zhang, Guodong, Chan, Harris, Ba, Jimmy

arXiv.org Machine LearningApr-2-2019

The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonstrate that the optimization performance of our method is more accurately captured by the structure of the noise covariance matrix rather than by the variance of gradients. Moreover, over the convex-quadratic, we prove in theory that it can be characterized by the Frobenius norm of the noise matrix. Our empirical studies with standard deep learning model-architectures and datasets shows that our method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated.

fisher, generalization, optimization and generalization, (13 more...)

arXiv.org Machine Learning

1902.08234

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Indiana > Hamilton County > Fishers (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Optimization in Machine Learning: Robust or global minimum?

@machinelearnbotJul-2-2017, 22:00:11 GMT

We understand that in convex problems it is much easier to find the global optimum. We appreciate the opportunity to participate in this discussion. KD, MF: No, the convexified problem can have a minimum that is quite different from the original problem. The motivation for our paper comes from the fact that in many problems (like control and reinforcement learning) one is interested in a "robust" minimum (a minimum such that the cost does not increase much when you perturb the parameters). Our method destroys non-robust minima and preserves a single robust minimum of the problem.

artificial intelligence, global minimum, machine learning, (15 more...)

@machinelearnbot

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback